1,868 research outputs found
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Smart accessibility patterns and shrinking cities: The added value of urban design
During these last decades, the shrinkage of cities has become a major urban issue, a process caused by many factors but one that will generally increase during the next years. This is mainly because of the trend of urbanization: in 2016 the UN estimated that 54.5% of people live in urban settlements, and that by 2030 it will become 60%. The non-urban areas impacted by depopulation, will face several issues in terms of land maintenance, heritage preservation, and conservation of local traditions.
This dynamic is strongly related to the notion of accessibility, which, here, stands for the possibility of people to access places, spaces, items, and services. This approach tries to include different points of view such as the notion of accessibility seen in transportation terms, based on its efficiency and multimodality, or the issue of accessibility concerning people with disabilities.
The ongoing digital revolution has further impacted the issue of accessibility. The pervasive transition from analogue to digital processes and the development of Information and Communication Technologies has provided new opportunities to supply information, infrastructures, and public services to people. With our smartphones, citizens can access and produce data, which can then be used by them to increase their awareness about urban opportunities and optimize urban projects and policies. Worldwide internet connection has blurred the relation between a place and its use, deepening reuse strategies for buildings and neighbourhoods. The development of shared and circular economy and new health standards in cities has led to the innovation of public services both in an evolutionary way (e.g. water supply and management, waste management) and in a disruptive way (e.g. transportation design, urban hybrid services). Smart Cities projects try to catch most of these opportunities, focusing on innovative urban solutions able to exploit this potential.
This article aims to contribute to this debate, reviewing some of the main definitions of urban accessibility and showing the possible added value given by innovative urban strategies open to ICT solutions. To better understand this approach these notions will be related to Gjirokastra, one of the most important cities in southern Albania. Its distinctive combination in terms of heritage, strategic position and business opportunities are facing urban shrinkage, with the consequent loss of city population, lack of maintenance of its renowned heritage and a declining economy. Then a design proposal that uses the notion of accessibility to analyse and indicate strategic accessibility patterns to challenge shrinkage will be outlined. These actions will be referenced to pilot projects and case studies to prove how innovative urban design can add new value to urban accessibility patterns. The conclusions will resume the role of urban design dealing with these issues, indicating constraints and potentials of this approach
Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English
The necessity of using a fixed-size word vocabulary in order to control the
model complexity in state-of-the-art neural machine translation (NMT) systems
is an important bottleneck on performance, especially for morphologically rich
languages. Conventional methods that aim to overcome this problem by using
sub-word or character-level representations solely rely on statistics and
disregard the linguistic properties of words, which leads to interruptions in
the word structure and causes semantic and syntactic losses. In this paper, we
propose a new vocabulary reduction method for NMT, which can reduce the
vocabulary of a given input corpus at any rate while also considering the
morphological properties of the language. Our method is based on unsupervised
morphology learning and can be, in principle, used for pre-processing any
language pair. We also present an alternative word segmentation method based on
supervised morphological analysis, which aids us in measuring the accuracy of
our model. We evaluate our method in Turkish-to-English NMT task where the
input language is morphologically rich and agglutinative. We analyze different
representation methods in terms of translation accuracy as well as the semantic
and syntactic properties of the generated output. Our method obtains a
significant improvement of 2.3 BLEU points over the conventional vocabulary
reduction technique, showing that it can provide better accuracy in open
vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine
Translation (EAMT), Research Paper, 12 page
Who Are We Talking About? Handling Person Names in Speech Translation
Recent work has shown that systems for speech translation (ST) – similarly to automatic speech recognition (ASR) – poorly handle person names. This shortcoming does not only lead to errors that can seriously distort the meaning of the input, but also hinders the adoption of such systems in application scenarios (like computer-assisted interpreting) where the translation of named entities, like person names, is crucial. In this paper, we first analyse the outputs of ASR/ST systems to identify the reasons of failures in person name transcription/translation. Besides the frequency in the training data, we pinpoint the nationality of the referred person as a key factor. We then mitigate the problem by creating multilingual models, and further improve our ST systems by forcing them to jointly generate transcripts and translations, prioritising the former over the latter. Overall, our solutions result in a relative improvement in token-level person name accuracy by 47.8% on average for three language pairs (en->es,fr,it)
Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary
We propose a method to transfer knowledge across neural machine translation
(NMT) models by means of a shared dynamic vocabulary. Our approach allows to
extend an initial model for a given language pair to cover new languages by
adapting its vocabulary as long as new data become available (i.e., introducing
new vocabulary items if they are not included in the initial model). The
parameter transfer mechanism is evaluated in two scenarios: i) to adapt a
trained single language NMT system to work with a new language pair and ii) to
continuously add new language pairs to grow to a multilingual NMT system. In
both the scenarios our goal is to improve the translation performance, while
minimizing the training convergence time. Preliminary experiments spanning five
languages with different training data sizes (i.e., 5k and 50k parallel
sentences) show a significant performance gain ranging from +3.85 up to +13.63
BLEU in different language directions. Moreover, when compared with training an
NMT model from scratch, our transfer-learning approach allows us to reach
higher performance after training up to 4% of the total training steps.Comment: Published at the International Workshop on Spoken Language
Translation (IWSLT), 201
Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018
This paper describes FBK's submission to the end-to-end English-German speech
translation task at IWSLT 2018. Our system relies on a state-of-the-art model
based on LSTMs and CNNs, where the CNNs are used to reduce the temporal
dimension of the audio input, which is in general much higher than machine
translation input. Our model was trained only on the audio-to-text parallel
data released for the task, and fine-tuned on cleaned subsets of the original
training corpus. The addition of weight normalization and label smoothing
improved the baseline system by 1.0 BLEU point on our validation set. The final
submission also featured checkpoint averaging within a training run and
ensemble decoding of models trained during multiple runs. On test data, our
best single model obtained a BLEU score of 9.7, while the ensemble obtained a
BLEU score of 10.24.Comment: 6 pages, 2 figures, system description at the 15th International
Workshop on Spoken Language Translation (IWSLT) 201
Does Simultaneous Speech Translation need Simultaneous Models?
In simultaneous speech translation (SimulST), finding the best trade-off
between high translation quality and low latency is a challenging task. To meet
the latency constraints posed by the different application scenarios, multiple
dedicated SimulST models are usually trained and maintained, generating high
computational costs. In this paper, motivated by the increased social and
environmental impact caused by these costs, we investigate whether a single
model trained offline can serve not only the offline but also the simultaneous
task without the need for any additional training or adaptation. Experiments on
en->{de, es} indicate that, aside from facilitating the adoption of
well-established offline techniques and architectures without affecting
latency, the offline solution achieves similar or better translation quality
compared to the same model trained in simultaneous settings, as well as being
competitive with the SimulST state of the art
- …